AITopics | data condition

Collaborating Authors

data condition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability

Neural Information Processing SystemsMar-20-2026, 17:35:53 GMT

Autoregressively trained transformers have brought a profound revolution to the world, especially with their in-context learning (ICL) ability to address downstream tasks. Recently, several studies suggest that transformers learn a mesa-optimizer during autoregressive (AR) pretraining to implement ICL. Namely, the forward pass of the trained transformer is equivalent to optimizing an inner objective function in-context.However, whether the practical non-convex training dynamics will converge to the ideal mesa-optimizer is still unclear.Towards filling this gap, we investigate the non-convex dynamics of a one-layer linear causal self-attention model autoregressively trained by gradient flow, where the sequences are generated by an AR process $x_{t+1} = W x_t$. First, under a certain condition of data distribution, we prove that an autoregressively trained transformer learns $W$ by implementing one step of gradient descent to minimize an ordinary least squares (OLS) problem in-context. It then applies the learned $\widehat{W}$ for next-token prediction, thereby verifying the mesa-optimization hypothesis. Next, under the same data conditions, we explore the capability limitations of the obtained mesa-optimizer. We show that a stronger assumption related to the moments of data is the sufficient and necessary condition that the learned mesa-optimizer recovers the distribution. Besides, we conduct exploratory analyses beyond the first data condition and prove that generally, the trained transformer will not perform vanilla gradient descent for the OLS problem. Finally, our simulation results verify the theoretical results.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.61)

Add feedback

Establishing Validity for Distance Functions and Internal Clustering Validity Indices in Correlation Space

Degen, Isabella, Abdallah, Zahraa S, Brown, Kate Robson, Reeve, Henry W J

arXiv.org Artificial IntelligenceDec-8-2025

Internal clustering validity indices (ICVIs) assess clustering quality without ground truth labels. Comparative studies consistently find that no single ICVI outperforms others across datasets, leaving practitioners without principled ICVI selection. We argue that inconsistent ICVI performance arises because studies evaluate them based on matching human labels rather than measuring the quality of the discovered structure in the data, using datasets without formally quantifying the structure type and quality. Structure type refers to the mathematical organisation in data that clustering aims to discover. Validity theory requires a theoretical definition of clustering quality, which depends on structure type. We demonstrate this through the first validity assessment of clustering quality measures for correlation patterns, a structure type that arises from clustering time series by correlation relationships. We formalise 23 canonical correlation patterns as the theoretical optimal clustering and use synthetic data modelling this structure with controlled perturbations to evaluate validity across content, criterion, construct, and external validity. Our findings show that Silhouette Width Criterion (SWC) and Davies-Bouldin Index (DBI) are valid for correlation patterns, whilst Calinski-Harabasz (VRC) and Pakhira-Bandyopadhyay-Maulik (PBM) indices fail. Simple Lp norm distances achieve validity, whilst correlation-specific functions fail structural, criterion, and external validity. These results differ from previous studies where VRC and PBM performed well, demonstrating that validity depends on structure type. Our structure-type-specific validation method provides both practical guidance (quality thresholds SWC>0.9, DBI<0.15) and a methodological template for establishing validity for other structure types.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.16497

Country: North America > United States > California (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.67)

Industry:

Health & Medicine (0.67)
Banking & Finance (0.45)

Technology:

Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.45)

Add feedback

Efficient Conformance Checking of Rich Data-Aware Declare Specifications (Extended)

Casas-Ramos, Jacobo, Winkler, Sarah, Gianola, Alessandro, Montali, Marco, Mucientes, Manuel, Lama, Manuel

arXiv.org Artificial IntelligenceJul-2-2025

Despite growing interest in process analysis and mining for data-aware specifications, alignment-based conformance checking for declarative process models has focused on pure control-flow specifications, or mild data-aware extensions limited to numerical data and variable-to-constant comparisons. This is not surprising: finding alignments is computationally hard, even more so in the presence of data dependencies. In this paper, we challenge this problem in the case where the reference model is captured using data-aware Declare with general data types and data conditions. We show that, unexpectedly, it is possible to compute data-aware optimal alignments in this rich setting, enjoying at once efficiency and expressiveness. This is achieved by carefully combining the two best-known approaches to deal with control flow and data dependencies when computing alignments, namely A* search and SMT solving. Specifically, we introduce a novel algorithmic technique that efficiently explores the search space, generating descendant states through the application of repair actions aiming at incrementally resolving constraint violations. We prove the correctness of our algorithm and experimentally show its efficiency. The evaluation witnesses that our approach matches or surpasses the performance of the state of the art while also supporting significantly more expressive data dependencies, showcasing its potential to support real-world applications.

alignment, artificial intelligence, constraint-based reasoning, (15 more...)

arXiv.org Artificial Intelligence

2507.00094

Country:

Europe > Portugal > Lisbon > Lisbon (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.88)

Add feedback

On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability

Neural Information Processing SystemsMay-27-2025, 01:57:21 GMT

Autoregressively trained transformers have brought a profound revolution to the world, especially with their in-context learning (ICL) ability to address downstream tasks. Recently, several studies suggest that transformers learn a mesa-optimizer during autoregressive (AR) pretraining to implement ICL. Namely, the forward pass of the trained transformer is equivalent to optimizing an inner objective function in-context.However, whether the practical non-convex training dynamics will converge to the ideal mesa-optimizer is still unclear.Towards filling this gap, we investigate the non-convex dynamics of a one-layer linear causal self-attention model autoregressively trained by gradient flow, where the sequences are generated by an AR process x_{t 1} W x_t . First, under a certain condition of data distribution, we prove that an autoregressively trained transformer learns W by implementing one step of gradient descent to minimize an ordinary least squares (OLS) problem in-context. It then applies the learned \widehat{W} for next-token prediction, thereby verifying the mesa-optimization hypothesis. Next, under the same data conditions, we explore the capability limitations of the obtained mesa-optimizer.

artificial intelligence, autoregressively, machine learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.45)

Add feedback

SPIEDiff: robust learning of long-time macroscopic dynamics from short-time particle simulations with quantified epistemic uncertainty

He, Zequn, Reina, Celia

arXiv.org Artificial IntelligenceMay-21-2025

The data-driven discovery of long-time macroscopic dynamics and thermodynamics of dissipative systems with particle fidelity is hampered by significant obstacles. These include the strong time-scale limitations inherent to particle simulations, the non-uniqueness of the thermodynamic potentials and operators from given macroscopic dynamics, and the need for efficient uncertainty quantification. This paper introduces Statistical-Physics Informed Epistemic Diffusion Models (SPIEDiff), a machine learning framework designed to overcome these limitations in the context of purely dissipative systems by leveraging statistical physics, conditional diffusion models, and epinets. We evaluate the proposed framework on stochastic Arrhenius particle processes and demonstrate that SPIEDiff can accurately uncover both thermodynamics and kinetics, while enabling reliable long-time macroscopic predictions using only short-time particle simulation data. SPIEDiff can deliver accurate predictions with quantified uncertainty in minutes, drastically reducing the computational demand compared to direct particle simulations, which would take days or years in the examples considered. Overall, SPIEDiff offers a robust and trustworthy pathway for the data-driven discovery of thermodynamic models.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.13501

Country:

Europe (0.67)
North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Spatio-Temporal Jump Model for Urban Thermal Comfort Monitoring

Cortese, Federico P., Pievatolo, Antonio

arXiv.org Artificial IntelligenceNov-18-2024

Thermal comfort is essential for well-being in urban spaces, especially as cities face increasing heat from urbanization and climate change. Existing thermal comfort models usually overlook temporal dynamics alongside spatial dependencies. We address this problem by introducing a spatio-temporal jump model that clusters data with persistence across both spatial and temporal dimensions. This framework enhances interpretability, minimizes abrupt state changes, and easily handles missing data. We validate our approach through extensive simulations, demonstrating its accuracy in recovering the true underlying partition. When applied to hourly environmental data gathered from a set of weather stations located across the city of Singapore, our proposal identifies meaningful thermal comfort regimes, demonstrating its effectiveness in dynamic urban settings and suitability for real-world monitoring. The comparison of these regimes with feedback on thermal preference indicates the potential of an unsupervised approach to avoid extensive surveys.

comfort regime, regime, thermal comfort, (16 more...)

arXiv.org Artificial Intelligence

2411.09726

Country:

Asia > Singapore (0.25)
Europe > Austria > Vienna (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Construction & Engineering (1.00)
Banking & Finance > Trading (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator

Kaneko, Takuhiro, Kameoka, Hirokazu, Tanaka, Kou

arXiv.org Artificial IntelligenceMar-25-2024

A generative adversarial network (GAN)-based vocoder trained with an adversarial discriminator is commonly used for speech synthesis because of its fast, lightweight, and high-quality characteristics. However, this data-driven model requires a large amount of training data incurring high data-collection costs. This fact motivates us to train a GAN-based vocoder on limited data. A promising solution is to augment the training data to avoid overfitting. However, a standard discriminator is unconditional and insensitive to distributional changes caused by data augmentation. Thus, augmented speech (which can be extraordinary) may be considered real speech. To address this issue, we propose an augmentation-conditional discriminator (AugCondD) that receives the augmentation state as input in addition to speech, thereby assessing the input speech according to the augmentation state, without inhibiting the learning of the original non-augmented distribution. Experimental results indicate that AugCondD improves speech quality under limited data conditions while achieving comparable speech quality under sufficient data conditions. Audio samples are available at https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/augcondd/.

augcondd, augmentation, speech, (16 more...)

arXiv.org Artificial Intelligence

2403.16464

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.61)

Add feedback

Scalable imputation of genetic data with a discrete fragmentation coagulation process

Neural Information Processing SystemsMar-14-2024, 11:03:49 GMT

We present a Bayesian nonparametric model for genetic sequence data in which a set of genetic sequences is modelled using a Markov model of partitions. The partitions at consecutive locations in the genome are related by the splitting and merging of their clusters. Our model can be thought of as a discrete analogue of the continuous fragmentation-coagulation process [Teh et al 2011], preserving the important properties of projectivity, exchangeability and reversibility, while being more scalable. We apply this model to the problem of genotype imputation, showing improved computational efficiency while maintaining accuracies comparable to other state-of-the-art genotype imputation methods.

dfcp, partition, sequence, (16 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

RadImageGAN -- A Multi-modal Dataset-Scale Generative AI for Medical Imaging

Liu, Zelong, Zhou, Alexander, Yang, Arnold, Yilmaz, Alara, Yoo, Maxwell, Sullivan, Mikey, Zhang, Catherine, Grant, James, Li, Daiqing, Fayad, Zahi A., Huver, Sean, Deyer, Timothy, Mei, Xueyan

arXiv.org Artificial IntelligenceDec-10-2023

Deep learning in medical imaging often requires large-scale, high-quality data or initiation with suitably pre-trained weights. However, medical datasets are limited by data availability, domain-specific knowledge, and privacy concerns, and the creation of large and diverse radiologic databases like RadImageNet is highly resource-intensive. To address these limitations, we introduce RadImageGAN, the first multi-modal radiologic data generator, which was developed by training StyleGAN-XL on the real RadImageNet dataset of 102,774 patients. RadImageGAN can generate high-resolution synthetic medical imaging datasets across 12 anatomical regions and 130 pathological classes in 3 modalities. Furthermore, we demonstrate that RadImageGAN generators can be utilized with BigDatasetGAN to generate multi-class pixel-wise annotated paired synthetic images and masks for diverse downstream segmentation tasks with minimal manual annotation. We showed that using synthetic auto-labeled data from RadImageGAN can significantly improve performance on four diverse downstream segmentation datasets by augmenting real training data and/or developing pre-trained weights for fine-tuning. This shows that RadImageGAN combined with BigDatasetGAN can improve model performance and address data scarcity while reducing the resources needed for annotations for segmentation tasks.

augmentation, dataset, real data, (14 more...)

arXiv.org Artificial Intelligence

2312.05953

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Add feedback

Learning an artificial language for knowledge-sharing in multilingual translation

AIHubDec-7-2022, 12:52:52 GMT

In their recent paper Learning an artificial language for knowledge-sharing in multilingual translation, Danni Liu and Jan Niehues investigate multilingual neural machine translation models. Here, they tell us more about the main contributions of their research. Neural machine translation (NMT) is the backbone of many automatic translation platforms nowadays. The second characteristic is especially useful in low-resource conditions, where training data (translated sentence pairs) are limited. To enable knowledge-sharing between languages, and to improve translation quality on low-resource translation directions, a precondition is the ability to capture common features between languages.

artificial language, multilingual translation, translation, (15 more...)

AIHub

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback